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Abstract. Since the innovation of the ubiquitous Kalman filter more than five decades back it is 
well known that to obtain the best possible estimates the tuning of its statistics Xo, Po, ©, R and 
Q namely initial state and covariance, unknown parameters, and the measurement and state noise 
covariances is very crucial. The earlier tweaking and other systematic approaches are reviewed but 
none has reached a simple and easily implementable approach for any application. The present 
reference recursive recipe based on multiple filter passes through the data leads to a converged 
‘statistical equilibrium’ solution. It utilizes the pre, post, and smoothed state estimates and their 
corresponding measurements and the actual measurements as well as their covariances to balance 
the state and measurement equations and form generalized cost functions. The filter covariance at 
the end of each pass is heuristically scaled up by the number of data points and further trimmed 
to provide the Po for subsequent passes. A simultaneous and proper choice for Q and R based on 
the filter sample statistics and certain other covariances leads to a stable filter operation providing 
the results after few iterations. When only R is present in the data by minimizing the ‘innovation’ 
cost function J using the non filter based Newton Raphson optimization results served as an 
anchor for matching and tuning the filter statistics. When both R and Q are present in the data 
the consistency between the injected noise sequences and their statistics provided a simple route 
and confidence in the present approach. A typical simulation study of a spring, mass, damper 
system with a weak non linear spring constant shows the present approach out performs earlier 
techniques. The Part-2 of the paper further consolidates the present approach based on an analysis 
of real airplane flight test data. 

Keywords. Adaptive EKF, Expectation Maximisation, Maximum Likelihood, Covariance match¬ 
ing, Recursive parameter estimation, Cramer Rao Bound, Probability Matching Prior. 


1 Introduction 

Kalman proposed the solution for the linear filtering problem in his famous 1960 paper which is 
known as Kalman filter(KF). It is not well known to many that the enthusiasm which followed 
soon after Kalman introduced his filter was damped since the statistics of the process (Q) and 
measurement noise (R) had to be provided to design and implement the filter. Gauss had a 
relatively ideal situation with a good system model and only the measurements had noise and 
thus with his least squares approach he could get an estimate and a qualitative measure for the 
uncertainty. Kalman when he proposed the filter dealt with only state estimation. Presently 
the scale and magnitude of many difficult and interesting problems that estimation theory (ET) is 
handling could not have been comprehended by Gauss or Kalman. In many present day applications 
one does not even know quite well the structure of the state and measurement equations as well as 
the parameters in them and the statistical characteristics of the state and measurement noise. It is 
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possible to add the unknown initial conditions of the state as well. One can summarize that almost 
nothing is known but everything has to be determined or estimated from the measurements alone! 
This means the connecting relationship between the state and the measurement has to be found 
out. This has to be from previous knowledge or intuition such that it is meaningful, reasonable, 
acceptable, or useful. Even this has to be achieved only by the internal consistency of the various 
quantities and/or the variables that occur at different times during the filter operation through 
the data. Further the systems are large and the best possible optimal estimation has to be worked 
out in real time. However due to the enormous computing power that is presently available it 
has been possible to handle the above situations. The other unknowns include system parameters 
(0), the initial state Xo and its covariance Po- Tuning is done manually even today since the 
adaptive filtering method had not matured to a routine procedure or recipe. This is in spite of 
its applications in many fields of science and engineering including among many in airplane flight 
test data analysis (Klein and Morelli 2006), target tracking (Bar-Shalom et al. 2001), evolution 
of the space debris scenario (Ananthasayanam et al. 2006), fusion of GPS and INS data (Grewal 
et al. 2007), study of the tectonic plate movements (Kleusberg and Teunissen 1996), high energy 
physics (Fruhwirth et al. 2000), agriculture, biology and medicine (Federer and Murtliy 1998), 
dendroclimatology (Visser and Molenaar 1988), finance (Wells 1996), source separation problem 
in telecommunications, bio medicine, audio, speech, and in particular astrophysics (Costagli and 
Kuruoglu 2007), and atmospheric data assimilation for weather prediction (Evensen 2009). 

Instead of tweaking, an adaptive and systematic estimation of Xo, Po, 0, R and Q is known as 
the problem of tuning the Kalman filter. In the present work, a reference recursive recipe (RRR) 
for tuning the Kalman filter is proposed. A simultaneous and proper choice for Q and R based on 
the filter sample statistics and certain other covariances leads to a stable filter operation providing 
the results after few iterations. We have also suggested an alternative statistic for the estimation 
of Q based upon the difference between the stochastic and dynamic trajectories (see Section-5.4). 
The RRR attains statistical equilibrium after few filter iterations (without any direct optimization) 
over the data together with internal consistency checks through different cost functions. The cost 
functions (see Section -5) act as a pointer for the user to decide on the convergence of the filter which 
can otherwise be deceptive. Another important factor which is generally ignored is the Cramer 
Rao Bound (CRB) of the unknown parameters. This paper also discusses the importance of tuning 
the Po and a simple heuristic scaling up method (see Section-5.2) is proposed for achieving the 
CRBs. 

The layout of the paper (Part-1) is as follows. The problem is set up in an extended Kalman filter 
(EKF) framework in Section-2. The filter tuning and its importance is discussed in Section-3. A 
review of the earlier adaptive tuning methods and the present approach is discussed in Section-4 
and 5 respectively. The Section-6 discusses the results of the proposed reference recursive recipe 
(RRR) applied to a simulated spring mass and damper system with weak nonlinear spring constant. 
The concluding remarks are made in Section-7. The Part-2 of the paper shows the efficacy of the 
proposed RRR on more involved real airplane flight test data. 


2 Extended Kalman Filter Equations 

We consider an Extended Kalman Filter (EKF) formulation as it provides the best scenario since 
other filter formulations contain the effect of approximations, discretization and other features. 
Consider the following well known discrete time nonlinear filtering problem given by 

x k = f(x k -i,0,u k -i)+ w k (1) 

Z k = h{x k , ©) + v k , k = 1, 2,..., N (2) 

where ‘x’ is the state vector of size n x 1, ‘u’ is the control input and ‘Z’ is the measurement 
vector of size m x 1. The ‘f’ and ‘h’ are non linear functions of state and measurement equations 
respectively. The process noise, w k ~ J\f ( 0, Q) and the measurement noise, v k ~ J\f ( 0, R ) 
are assumed to be zero mean additive White Gaussian Noise (WGN). The Normal or Gaussian 
distribution is represented by ‘A/ - ’ and its assumed that, 

E [wk'wj] = Q S(k — j ) with E = 0 
E [ffcuj] = R S(k — j) with E [vk] = 0 
E[w k vj] = 0 V j, k = 1, 2,...,N 
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where E[ ] is the expectation operator, 6 is Kronecker delta function defined as 


S(k-j) 


0 if k^j\ 
1 if k = j. 


In the EKF formulation the parameter vector ‘0’ of size p x 1 is augmented as additional states 
leading to, 
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The non linear filtering problem is now redefined as 

X k = f(X k _ 1 ) + w k (3) 

Z k = h(X k )+v k , k = l, 2,...,N (4) 

where ‘X’ and ‘w’ are respectively the augmented state and process noise vector is of size (n+p) x 1 
and thus w k ~ J\f ^0, ^ 0 )' contr °l input ‘u’ and the ‘hat’ symbol for estimates are not 

shown for brevity. A solution to the above problem is summarised in Brown and Hwang (2012), 


Initialisation : Xo — E [X t o]. P o — E [(Xo — X t o)(Xo — X t o) T ] 

Prediction step : X k]k _ x = f{X k _ l]k _ x ), P fc | fc _i = Fk-\Pk-i\k^ F k -\ + Q 

Update step : K k = P k {H k P klk _ 1 Hj^ + R ) -1 

X k \ k = X k \ k _i + K k {Z k — h(X k \ k _i)), P k \ k = (I — K k H k )P k \ k _ 1 

where all the symbols have their usual meaning and 


True initial state 
Initial state estimate 
Initial state covariance matrix 

State Jacobian matrix 

Measurement Jacobian matrix 

The above steps that statistically combine two estimates at any given time point, one from state 
X k \ k _i and the other from measurement Z k equation are formal if only their uncertainties denoted 
by their covariances are given. The states can be estimated given the initial Xo and Po over a time 
span along with the process noise input with covariance Q and updated with measurements with 
noise covariance R. In order to match or minimize the difference v k = Z k — h(X k ^ k _ 1 ) called the 
innovation in some best possible sense the estimates from the states and measurements over a length 
of data a well known criterion is the Method of Maximum Likelihood Estimation (MMLE). The 
innovation follows a white Gaussian distribution (Kailath 1970) which is operationally equivalent 
to minimizing the cost function 

J =-^^2vk{H k P k \ k _ 1 H k +R) _1 z/ k T 

=J(X o ,Po,Q,R,0) 

=J(Xo, 0, K (traded for P„, Q, R)) 

based on the summation over all the N measurements and thus solving for either Xo, Po, Q,R, 0 
or solving for X 0 , 0, K as the case may be. When Q = 0, the MMLE is called as the output 
error method with the Kalman gain matrix being zero. In the usual Kalman filter implementation 
generally one does not solve for the statistics Po, Q and R but tweak manually to obtain acceptable 
values. The numerical effort of minimizing J has to appear in the estimation of the filter statistics. 
The Kalman filter is not a panacea to obtain better results when compared to simpler techniques 
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of data analysis. The accuracy of the results using Kalman filter depends on its design based on 
the choice of Xo, Po, 0, R and Q. If the above values are not chosen properly then the filter 
results can be inferior to those from simpler techniques. 

There are five steps in the Kalman filter, namely state and covariance propagation with time, 
Kalman gain calculation and the state and covariance updates by incorporating the measurement. 
In the filter statistics approach all the five steps have to be gone through. The state propagation 
and update refer to the sample while the covariance propagation, update, and the Kalman gain 
refer to the ensemble characteristics. 

In the present work we introduce a generalized cost function by an expansion of the usual ‘inno¬ 
vation’ to other quantities generated by the Kalman filter such as the prior and posterior state 
respectively before and after the measurement is assimilated, the smoothed state after all the mea¬ 
surements are processed. We demonstrate that it is such an approach that decisively indicates the 
best possible solution in the simulated data analysis and more so in the analysis of real flight test 
data. 

3 Timing of the Kalman Filter Statistics 

Some fundamental differences exist between the cost function handled by classical optimization 
problems and the Kalman filter. The former deals with a static cost function, with a fixed model 
and a finite amount of data, with deterministic unknowns. In contrast the Kalman filter has 
to grapple with a dynamical time varying cost function due to possibly time varying state and 
measurement model structure and their parameters, with increasing number of measurements with 
the unknowns being both deterministic and the statistics of probabilistic variables. Obviously the 
effort to be put in minimizing J cannot be swept under the rug! Generally one manually tweaks the 
statistics to reach acceptable results instead of tuning properly to get even better results. In spite 
of its immense applications for more than five decades in many problems of science and technology 
the filter tuning has not matured to a simple and easily implementable approach for any problem. 
Even a routine adaptive filtering technique to estimate a constant signal with measurement noise 
does not appear to exist! 

The ghost of filter tuning chases every variant and formulation of the filter be it EKF, Unscented 
Kalman Filter (UKF), Particle filter (PF), or the Ensemble Kalman Filter (EnKF) or their combi¬ 
nations. If one desires to obtain near optimal estimates then the filter statistics have to be properly 
tuned. In the absence of proper filter tuning it is difficult to infer if the performance of the variants 
of Kalman filter are due to their formulation or filter tuning! 

In the best spirit of the estimation theory and in particular the recursive Kalman filter approach 
even if X 0 , Po, 0, R and Q are unknown or inaccurately known the filter should still have the 
ability to estimate all the above from the measurements Z and perhaps commencing not too far 
from their proper values. The filter results should prove to be self consistent and estimate all the 
unknowns. Generally one tunes the filter statistics off line using simulated data and later use it to 
process real data on line or even in real time. 

The filter tuning varies from ad hoc, through heuristic to rigorous methods. Generally the tuning is 
manual or with adhoc quick fix solutions such as limiting P from going to zero, or add Q to increase 
P before calculating the gain, multiply P by a factor to limit K all have obviously limitations in 
handling involved problems or scenarios. In the fading memory filter the R and Q estimates are 
based on the weighting between the current data and previous estimate. All the above introduce 
additional parameters to be adjusted that varies for every problem. The rigorous approach could 
be hard and time consuming to the extent of solving the whole problem. It is the middle path 
of heuristic approaches which is quite appealing and followed in the present work. The adaptive 
approach or filter tuning tries to obtain the filter statistics X 0 , Po, 0, R and Q by using the 
various quantities arising in the filter while operating over the measurement data. 


3.1 Qualitative Features of the Filter Statistics 

Should the Po = Q = 0 then the filter will learn nothing from the measurements and ignore it. 
The R is fairly objective and can be determined from the measured data. The Po is tricky and 
generally the off diagonal elements are set to zero and the diagonal elements are set to large values 
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but however their relative values are crucial for an optimum filter operation. The Po controls 
the handover from the initial transient to Q that control the steady state filter behavior. The Q 
though considered notorious helps to inject uncertainty into the state equations to assist the filter 
to learn from the measurements. Even when only the measurement noise is present in the data, 
starting with some initial estimate X 0 somewhat away from the true values, in order to assist the 
filter to learn from the measurements a non zero Q has to be injected into the state equations 
since the effect of Po will fade away quickly. A large value of Q will lead to a short transient with 
large steady state uncertainty of the estimates and vice versa for small Q. In the present work 
since both Xo and Po are simultaneously tuned for data without process noise the requirement 
for injecting Q does not arise. 

It is only by Q an analyst can handle the unmodelled or unmodellable errors. The process noise can 
estimate, account or offset for some deficiency, inaccuracy, or error in the initial conditions, system 
and measurement model equations, control or external input, measurement noise statistics, the 
numerical state and covariance propagation or update operations and even account for numerical 
errors. Though Q is considered notorious it is the life line of the Kalman filter to do good work. 
The Q is helpful to track systems whose dynamical equations are unknown. Some classic examples 
are the GPS receiver clocks, satellite, trajectory of aircraft, missiles and reentry objects. These are 
handled by using the kinematic relations between the position, velocity, acceleration, and even jerk 
(Mehrotra and Malrapatra 1997) all assumed (or even estimated by postulating stochastic models 
with floated parameters) to be driven by white Gaussian noise Q of suitable magnitude to enable 
the filter to track these systems. The process noise inhibits the onset of instability. 

There could be different methods of implementing the filter for any practical situation. One could 
work with time varying full, diagonal, or constant matrices Q and R, or work with constant Kalman 
gain matrix, or with important Kalman gain matrix elements. The constant Q and R matrices 
approach is generally preferable and the constant gain approach is more easily implementable 
though not necessarily optimal. 


3.2 Some Simple Choices for Initial Xo for States and Parameters 

For both the states and parameters the choice of Xo appears to be relatively weak for the filter 
estimates but the choice of Po is very important in particular for the parameter uncertainty 
represented by Cramer Rao Bound (CRB). Generally even with not quite a good tuning of the filter 
statistics the estimates could be near the optimum but lead to large variation in the uncertainty 
represented by the CRB. However it may be cautioned that for data with only measurement 
noise if a Newton Raplison (NR) technique is used to minimize the cost function the effect of 
the initial unknown state can affect the estimates but can be handled if this is also treated as an 
unknown. In the case of the filter by the process of smoothing the unknown initial state can be 
also estimated. 

Since some of the states are generally measured either the first or the average of the first few 
measurements can be taken as the initial value Xo for the state. The parameters are used as 
augmented states in the EKF route. Since either some computational or experimental results are 
available, these can be set as initial X 0 for the parameter values. At times if possible a least square 
parameter estimates based on balancing the governing differential equations using the appropriate 
measurements can also be used as start up values. 


3.3 Importance of Initial Po for States and Parameters 

If Po is set equal to zero (for a very confident choice for the initial estimates) then the filter ignores 
and learns nothing from the measurements. If Po is extremely large (a pessimistic choice for the 
initial values), then the filter believes the measurements much more and provides very little weight 
or ignores the state model values leading to large fluctuations in the state and parameter estimates 
along with large final uncertainty. Thus there appears to be a proper choice for Po which is neither 
zero nor infinity to provide proper estimates and uncertainty. 

This Po is one of the important tuning parameters as stressed by very few like Maybeck (1979), 
Candy (1986), and Gemson (1991) but most people treat it casually. Usually one assumes a guess 
Po which tends to become very low after some data points and in order to make the filter learn 
from the subsequent measurements introduce an additional Q by trial and error into the state 
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equations. This finally leads to some estimates and uncertainties. The former usually may be close 
but the latter generally away from the correct values. The peculiar situation is one has introduced 
Q even when there is no model structure uncertainty. In the present recursive procedure a proper 
Po without any Q is shown to be possible. 

The important point is that the initial state and parameter P 0 can affect the final covariance 
(Pn\n) from the filter operation. This can be crucial in certain state estimation problems such as 
impact point estimation and its uncertainty for target tracking. In such cases the results for the 
uncertainty from improper or inaccurate tuning can be deceptive. Even in parameter estimation 
problems the estimates and the assigned uncertainties can be important in the design of control 
systems. 


3.4 Choice of Initial R and Q 

Usually a good initial estimate for R can be obtained from the calibration of the measuring 
instrument and generally it is assumed to be constant over the data length. In the NR procedure 
of minimizing the cost function based on data without process noise even R can be treated as an 
unknown and estimated along with other unknown quantities. In principle the Q should reflect 
the uncertainty in the assumed state model or any ‘unmodellable’ feature of the state or even 
unknown random state input. The Q along with the initial Po plays a very important role in the 
filter operation without divergence. The value of Q should be small enough to retain the learning 
potential from the measurement but not large to make the filter estimates useless. 


3.5 Simultaneous Estimation of R and Q 

When the data contains the effect of both the measurement and process noise it becomes far more 
difficult and notorious for analysis. In general minimization of any cost function by treating R 
and Q as unknowns during which the simultaneous update and estimation of R and Q appears 
to be difficult as reported by many researchers in the field. For a given R changing the sequence 
of measurement noise makes the dynamics noisy meaning more blurred. In contrast the varying 
sequence of process noise makes the dynamical system wander randomly. The interesting point is 
the filter by tracking the drifted dynamical behaviour even with large Q estimates the parameters 
controlling the original dynamics of the system without the effect of R and Q. Since R and Q 
occur respectively in the measurement and state equations their effects are negatively correlated. 
The Q represents the average rate of change of the state, tracking ability by indirectly controlling 
the rate at which the old data are forgotten, and affects the estimation accuracy. The effect of 
R is opposite to that of Q. Thus during simultaneous recursive estimation if the statistics for 
estimating them are not properly chosen then R is over estimated and Q is under estimated and 
vice versa. This is just the reason due to which Gernson (1991) in his approach had to update R 
and Q alternately. 


3.6 Tuning Filter Statistics with only R 

In the spirit of recursive filtering operations we have sought to tune all the filter statistics by 
repeated iterative processing of the data without any direct optimization procedures. When only 
measurement noise is present in the data it is possible to get both the estimates and the CRB by 
minimizing the cost function J formed by the difference of the estimated and the actual measure¬ 
ment. The NR procedure carries out the above task very efficiently for simple to involved systems 
(Ananthasayanam et al. 2001). The results from the above NR procedure served as an anchor for 
tuning the filter parameters to get the closest possible estimates and the CRB from the present 
recursive procedure. It appears to be not feasible to reproduce the exact NR results from the filter 
for each and every data but sufficiently close. 

3.7 Tuning Filter Statistics with both R and Q 

However when process noise is also present in the data without solving the involved optimiza¬ 
tion problem we look for consistency based on a comparison between the injected and estimated 
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sequences of R and Q. Further the additional cost functions proposed in our work based on bal¬ 
ancing the state and measurement equations helped to obtain confidence in the results. When the 
filter operates through the data it generates prior, post, and smoothed state estimates and their 
covariances which help to generate candidate ‘statistic’ to estimate both R and Q. From these 
one can also generate additional cost functions to see how best the state equations are balanced. 
Then all the above state estimates and covariances can be transformed into measurement space to 
be compared once again with the actual measurements and its covariance to generate candidate 
‘statistic’ to estimate R. These help to form more cost functions to see how best the measurement 
equations are balanced. It is necessary to choose from among the above ‘statistics’ the proper 
combination of ‘statistic’ for simultaneously estimating R and Q. Any improper combination does 
not lead to proper filter operation and even if it does leads to inappropriate results as is the case of 
some earlier approaches to solve the filter tuning problem. The subsequent sections after a review 
of earlier approaches propose the present method. This paper demonstrates the results using the 
simulated data of a spring, mass, and damper system with a weak non linear spring followed by 
real data in the second part of the paper. 


4 Review of Earlier Adaptive Kalman Filtering Approaches 

One brute force method for tuning filter statistics is to carry out an optimization exercise to solve 
for the statistics based on minimizing a suitable cost function over thousands of combinations of 
the unknowns. These would be too unwieldy requiring such a massive numerical exercise to be 
carried out for each and every new problem scenario. We should thus look for elegant approaches 
which help tune the filter with reasonable numerical effort. In particular in EKF if the unknown 
noise covariances are incorrectly specified biased estimates can arise (Ljung 1979, Ljungquist and 
Balchen 1994). Even when the system parameters are known, if an inaccurate description of the 
noise statistics are used the filter may give poor estimates, or even diverge. The following sections 
discuss the four broad adaptive filtering techniques. 


4.1 Bayesian Method 

Every update in a Kalman filter is obviously a Bayesian update. The work of Alspach (1974) deals 
with a bank of autonomous Kalman filter run with a range of Kalman gains. Each one stores a 
running sum of the square of the residuals. Subsequently it is possible to obtain the estimates of the 
unknowns based on a weighted sum over the grid points of the gain. Hilborn and Lainiotis (1969) 
show that a Bayesian optimal adaptive estimation system converges to the average performance of 
an (unrealizable) optimal system operating with knowledge of the parameters which are unknown 
to a Bayesian adaptive/learning system. 


4.2 Maximum Likelihood (ML) and Expectation Maximization (EM) 

The ML and EM methods need the specification of the probability distribution (usually a Gaussian) 
followed by the difference between the measurement and the model prediction called innovation 
or prediction error. The maximum likelihood methods (Kashyap 1970, Bohlin 1976) maximize the 
likelihood function based on the innovations containing the unknown covariances Q and R. Usually 
time consuming gradient based numerical optimisation or other schemes that reprocess the data 
several times are required to estimate the unknown covariances. 

In order to avoid nonlinear optimization in the ML approach Shumway and Stoffer (2000) proposed 
an iterative method using the expectation maximization (EM) technique. The EM consists of 
expectation and maximization steps. First the states are estimated using an initial guess of the 
unknown parameters based on a Kalman smoother. The unknown parameters are next estimated 
by ML method. This process of estimating the states using the Kalman smoother and optimizing 
the parameters using is repeated until parameter convergence. 

Bavdekar et al. (2011) used both a direct optimization method and an extended EM method for 
nonlinear systems, based on the extended Kalman filter. Their first approach directly optimizes the 
likelihood function of the innovation sequence generated by the EKF using sequential quadratic 
programming. The second approach using the extended EM method, maximizes the likelihood 
function of the complete data set of the states and measurements in an iterative procedure. 
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Recently Zagrobelny and Rawlings (2014) similarly proposed a ML method to identify Q and 
R assuming the parameters are known. By writing the outputs in terms of the process and 
measurement noises, they form a normal distribution for the sequence of measurements. The 
variance of this distribution is a function of the unknown noise covariances, and the likelihood is 
optimized with respect to these covariances. Simulations are used to compare the ML method 
to several existing methods like the autocovariance least squares (ALS) method, an alternate ML 
method based on the innovations, and an EM method. 


4.3 Covariance Matching 

During the filter pass over the data a number of random variables such as pre, post, and smoothed 
states arise and these transformed to the measurements along with actual measurements many more 
statistics are available. Combining these one can form innovation, residue, smoothed residues and 
so on. As mentioned earlier among the five filter equations two refer to the sample and the other 
three refer to the ensemble characteristics. If the filter is performing well then the sample statistics 
formed by the above should be internally consistent with their ensemble properties also provided 
by the filter. These can be matched by running many Monte Carlo simulations. However taking 
sample statistics at various time instants during a filter pass one can form equations connecting 
the many random variables, their combinations and their covariances to estimate the unknown 
covariances. The different approaches of Myers and Tapley (1976), Gemson (1991), Mohamed and 
Schwarz (1999), Bavdekar et al. (2011), and the present approach are examples for the covariance 
matching. Different statistics are chosen either over a window or the full data length, after each 
pass for estimating R and Q. These estimates could be used in subsequent passes as is necessary. 
Myers and Tapley (1976) approach using the innovation (available before update) can at times make 
the R lose its positive definiteness. This is because R depends on the difference of two matrices 
which could become negative. In order to overcome such an eventuality they proposed a remedy of 
simply turning the estimate as positive! Mohamed and Schwarz (1999) have suggested a more stable 
statistic based on the residue (available after update) for estimating R which is the sum of two 
matrices. The R estimates are generally more accurate than Q. Simultaneous estimation of R and 
Q is not easy perhaps as they negatively affect the filter performance. These covariance matching 
techniques are preferred due to their simplicity and speed despite being suboptimal. Gemson (1991) 
and Gemson and Ananthasayanam (1998) showed the importance of Po for parameter estimation. 
Similarly many statistics are available and are utilized for estimating Q. 


4.4 The Correlation Technique 

The innovation theorem of Kailath (1970) states that the innovation sequence is zero mean white 
Gaussian. Kailath further stated that if any gain other than the optimal is used then estimates 
will be suboptimal and the innovation mean will be non zero and the autocorrelation will not 
follow the Kronecker delta function. Also if the covariance of the innovations are not as expected 
they are indicative of the choice of any or all of the system matrices as well as the covariances are 
incorrect. 

The correlation technique pioneered by Mehra (1970, 1972) and Carew and Belanger (1973) and 
Belanger (1974) is the earliest and highly cited method for determining the unknown covariances. 
Mehra’s approach is based on the properties of the innovations process that must be white and 
Gaussian. Starting from an assumed value for the unknown R and Q an initial estimate for the 
steady state Kalman gain is obtained. This sequence is checked to see whether the particular 
Kalman gain implemented generates a statistically acceptable white noise sequence. However the 
Kalman gain can take correct value even when R and Q are far away from their true values. This 
is because different combinations of R and Q can lead to the same gain. 

Later Neethling and Young (1974) noted large covariances from Mehra’s approach due to some 
other deficiencies. The approach of Oussalah and De Scliutter (2000) improves Mehra’s (1970) and 
Carrew’s and Bellanger’s (1973) approaches by incorporating information about the quality of the 
innovation estimates leading to a weighted least squares methodology. The weights are determined 
using a Bhattacharyya distance criterion between the ideal probability and the distribution refer¬ 
ring to the current first and second order statistics of the autocorrelation functions. The latter 
helps to generate a convergent sequence to the steady state filter, which after some manipulations 
allows one to determine the values of the noise statistics Q and R. 



The latter work of Odelson et al. (2006) showed based on some counter examples the mathematical 
conditions regarding the system and measurement matrices are not sufficient and not necessary in 
Mehra’s work (1970). Also they showed that the variance estimates of Mehra are larger and often 
negative as well that is unacceptable. They proposed a method called constrained Autocovariance 
Least Squares (ALS) method corrected the above and obtained a much smaller and better estimates 
and none negative as well. 


4.5 Other Approaches 

Valappil and Georgakis (2000) used the available information about the model parametric uncer¬ 
tainties and translate this information to the process noise covariance Q. They propose two methods 
called linearized and Monte Carlo approaches. They also assume that R is readily available from 
the measuring instrument characteristics. The Monte Carlo simulations are run with parameter 
values sampled from the assumed normal distribution, with means and covariances. Then the 
difference between the nominal and the random dynamical state trajectory over many simulations 
is taken to provide the process noise at any time instant to be used in the filter equations. 

Some attempts have been made like Powell (2002) using the simplex method, Oshman and Slraviv 
(2000) using the genetic algorithm, and controlled random search (Anilkumar 2000). However 
when the dimension, nonlinearity, and the range of search space become large these could become 
computationally prohibitive and could lead to local minimum. 

Manika Saha et al. (2014) felt that X 0 and Po are not critical since they affect only the initial 
transient filter performance and Po has to be decided by designers since P changes as the filter 
operation proceeds reaching a steady state if only the system dynamics does not substantially 
change and a suitable choice of R can be easily made. Thus for a chosen suitable values for 
Xo, Po, and R they identified the critical Q by forming two metrics based on the innovation 
covariance. These vary from zero to the number of measurements and vice versa as Q changes 
from zero to infinity they proposed that Q can be chosen around the intersection point of these 
two cost functions. 

Lau and Lin (2011) also discuss the limitations of simulated annealing and particle optimization 
techniques for filter tuning. One can summarize that deterministic or probabilistic optimization 
approaches do not appear to be efficient for solving the filter tuning problem. Hence we tried 
to see if a recursive filtering approach would work and fortuitously it had worked and will be 
demonstrated subsequently. 


5 Present Recursive Reference Recipe for Tuning Filter Statistics 

Fundamentally the Estimation Theory (ET) is an optimization problem. Hence a suitable cost 
function J has to be chosen. Essentially there are two elements in ET (i) Defining a cost function, 
(ii) Adopting a suitable algorithm to minimize the cost function. The Likelihood cost (L) for 
normally distributed error (e) (Bohn 2000) is given by 

1 N 1 

L(©) = n ^2 2 ( efe ) 7 Ak 1 ( efe ) + l°g(det{ A k )) (5) 

fc=l 

where A is the error covariance matrix and det(A) represent determinant of matrix A. It may be 
noted that the unknown parameters (0) occurs implicitly and not explicitly in the cost function L. 
Since the filter provides many quantities it is possible to have many more terms in the cost function 
which is a function of errors in the initial state and parameter estimates, process noise driving 
the system and the measurement noise introduced by the measurement system, each weighted 
appropriately through suitable weighting covariance matrices. Thus the new cost function J in a 
weighted least square sense accounts for (i) A priori knowledge about the initial estimates, and (ii) 
Balancing the measurement equations, (iii) Balancing the system equations. One can call this as 
generalized MLE (J), 


J = J0 +J1+J2 + J3 +J4 + J5 + J6 + J7 +J8 


( 6 ) 
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whose terms are defined as 


JO =i(Xo - X t0 ) T P o - 1 (X o - Xto) 

1 N 

31 =N ^ Zk ~ K X k\k-i)) T Sl k \Z k - h(X k | fc _!)) 

k= 1 

1 N 

32 =n ^ Zk ~ h (Xk\ k )) T S2-\Z k - h(X k , fc )) 

v k =1 

1 * 

J3 =]v ~ ft(^|iv)) T S3fe :l (Z fc - h(X k |jv)) 

fe = l 

1 ^ 

J4 =— ^2,{Z k - h(Xd k \ N )) T (Z k — h(Xd k \ N )) 
v k —l 

1 * 

J5 =— '^JyZk — h(X k \ k _- l )) T Sl k 1 (Zfc — M^fclfc-i)) + /o5(det(51fe)) 

v fc=l 

1 * 

J6 =]y5Z wl fc|A rW1 fc lw;1 fe|^ 

fe=i 

1 ^ 

u '2|’| Ar W / 2 fc 1 u>2 fe | A r 

v fc =1 

1 * 

J 8 = jy lw3 fc|fc 

fe=i 

The ‘S’ and ‘W’ are functions of the second order moments given by Shyam et al. (2015) 

Sl k =H k Pk\ k -iH k + R 

S2 k = - H k \ k P k \ k H^ k + R 
S3 k = — H k \js[P k \ k -\H^ N + R 

Wife = — Pk\N ~ Fk-l\NPk-l\k-lF k -l\N + Pk,k-l\NFk-l\N + —l|Ar-^fc—11JV + Q 

lT2fc = — P k I JV — Fdfc-llJvPfc-llfc-l^-llJV + ^,fc-l|iV^Ll|iV + P/^fc-l|iV-P^fc-l|.?V + Q 

lT3fe =Pfe|fc_i — Pfc|fc 

If the initial states are known then JO is not necessary but if they are unknown, their estimate 
and covariance can be obtained respectively by the smoothed estimates X 0 |n and Po|n- The cost 
Jl, J2 and J3 are expected to tend towards the number of measurements (m). The cost J6, J7 
and J8 defined for states with process noise are expected to tend towards the number of states 
(n). The cost J4 is expected to tend towards the trace of R for Q = 0 case and J5 is the negative 
log likelihood function. 


5.1 Choice of Xo 

Commencing from an assumed reasonable initial choice for Xo, Po, 0, R and Q the first filter 
pass through the data is made. Then a backward smoothing procedure is carried out. Rauch, 
Tung and Striebel (RTS, 1965) smoother was used as a standard smoothing procedure throughout 
the present work. The smoothing leads to the best possible state and parameter estimates as well 
as their covariances. It may be noted after smoothing the state estimates and their covariances 
change but not that of the parameters. We next describe how the above choices are updated for 
further filter passes through the data to eventually converge which denotes statistical equilibrium. 
If the exact value of X 0 if not given it can be obtained by the smoothed estimates. In such 
probabilistic approaches any number of unknowns are only determined in a probabilistic sense and 
thus contain uncertainty due to the percolation of all the noise effects over all the unknowns. The 
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RTS smoothing equations for discrete time instants k = N-l, N-2,... ,0 are given by 


Kk\N — -Ffc|fe-Pfc-Pfe_|_i|fc 'j 

X k \N = x m + K klN (X k+m - X k+1]k ) > (7) 

F*k\N P k \k + Kk\N{Pk+l\N P k +1\k)-^-k\N J 

where K^, and P k \N is the smoothed gain, smoothed state estimate and smoothed state 
covariance matrix respectively. 


5.2 Choice of P 0 

The choice of Po for the next filter pass is very tricky. If one were to take the smoothed initial 
state covariance (Po|jv) and use it as the Po for the next pass then the final covariance keep on 
decreasing with further filter passes and eventually tend towards zero. We know that such a fact 
is unrealistic. In order to remedy the above behaviour the final covariance at the end of the pass 
was scaled up by the number of data points (N) and used at the beginning of the next pass. The 
only heuristic reasoning that can be provided from statistics is based on the fact that the mean 
from a sample has an uncertainty P that keeps decreasing with sample size as P/N where P is 
the population variance. Since in the filter steps the estimates and its update refer to the sample 
and the other covariance propagation, its update, and the calculation of the Kalman gain refer to 
the ensemble characteristics before every filter pass we carry out the above scale up method to 
obtain the Po for the next filter pass. We also propagated backwards the final covariance using 
the estimated parameters, and it turned out that there was not much of a difference with the Po 
that was used in the forward pass. The scaled up Pq (Shyarn 2014) is given by 


Po = N x P n\n 


( 8 ) 


The usual recommendation (Mehra 1970) when the states are measured is to set Po = R. Even 
by using the Inverse of Information Matrix (IIM) approach of Gernson (1991) obtained the same 
estimate for Pq as R. The IIM is given by 


Po = 


1 N 

-Y, F ^ H kR- 1 H k F k ^ l 


(9) 


Of course the above process does not end and we have to further trim the above Po to obtain the 
best possible CRB after some passes. The scaled up Po is a full matrix. Many changes such as 
using only the diagonal elements and many more variations were tried out. Finally the reference 
Po to obtain the proper CRB for the parameter estimates turns out to have all the elements 
are zero (the covariance of all the states and their cross covariance with the parameters as well) 
except the diagonal elements corresponding to the parameters. If all the elements of the parameter 
covariances were included and the state and its cross covariances set to zero, it did not make much 
of a difference in the final results. 


5.2a Probability Matching Prior Interpretation for Po 

If the deterministic NR optimization approach is considered to provide frequentist results then the 
Kalman filtering approach corresponds to the Bayesian route. It is useful to recall the immense 
amount of study for the choice of suitable prior distributions in the Bayesian approach for inference 
in statistics (Datta and Mukerjee 2005). The choice of appropriate probability distribution for Po 
is the probability matching prior (PMP) that provides a bridge between the above approaches to 
provide comparable results. 

Consider the simple case of a constant signal with noise. In the frequentist approach the calculation 
of the mean and standard deviation is direct. However in the Bayesian approach the above result 
is not reachable unless a proper Po is chosen which is the probability matching prior and in our 
case the tuning. Further we have in addition the process noise input to be handled. 
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5.3 Estimation of R and Q using EM Method 

The expression for R and Q using the Expectation Maximisation (EM) method extended to a 
non-linear system by Bavdekar et al. (2011) is given by, 


[ V k v k\ Z N\ 

k =1 
1 N 

Q = l WkW k \ Z n] 


( 10 ) 

( 11 ) 


5.3a Estimation of R 

Consider the measurement noise, v k = Z k — h(X k ) which can be approximated using first order 
Taylor series expansion around the smoothed estimate, X k \ x given by 

Vk « Z k — h(X k \ N ) — H k \ N X k \ N 


where H k \ N — §jt\ x =x klN an< ^ X k \N — X k — X k \ N 

v kV k = Z k Z k - Z k h T (X k jjv) — Z k X k \ N H k \ N — h{X k \ N )Z k + h(X k \ N )h T (X k \ N ) 

+ h(X k \ N )X k \ N H k \ N ~ H k \ N X k \ N Z k + H k \ N X k \ N h T (X k \ N ) + H k \ N X k \ N X k \ N H k \ N 

We know that 


E[X k \ N \ — E[X k — X k \ N ] — 0 

E[X k]N X% N \ = E[(X k - X k{N )(X k - X k \ N ) T ] = P k{N 
Thus the conditional expectation is given by 

E \v k v k \Z N \ = Z k Z k - Z k h T (X k |jv) — h{X k \ N )Z k + h{X k \ N )h T (X k \ N ) + El k \ N P k \ N H k \ N 
Rearranging the above terms and using Eq-10, we get 

1 N 

R- = ^2 {(^fc - h(X klN ))(Z k - h(X k \ n )) t + H k \ N P k \ N H k \ N ^ (12) 


5.3b Estimation of Q 


Consider the process noise, w k = X k — f(X k _ i) which can be approximated using first order Taylor 
series expansion around the smoothed estimate, X k _u N given by 


W k S=S X k — f(X k _i |jv) ~ —l|iV 

where F k _ 1 \ N = §x\x=x k _ llN anc ^ -^-k-i\N — X k ~1 — 

w k wl = x k xl - X k f T (X k _ m ) - f(X k _ m )X% + f(X k _ m )f T (X k _ m ) 
- XkXZ_ m F£_ m + /(X fc _ 1 | A r)Xj_ 1 | 7 V F^l 1 | A r - F k _!\ N X k _i\ N X k 
+ E k _i\ N X k _ l \ [N f(X k _i\ N ) T + F k _i\ N X k _ 1 \ N X k _ 1 \ N F k _ 1 \ N 

The following results are used in calculating the conditional expectation in Eq-11 

E[X k X k \Z N } = X k \ N X k , N + P k \ N ) 

E[X k X k _ x \Z N \ = X klN X k _ 1{N + P k<k -i\N J 


(13) 


(14) 
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The lag-one covariance, i\fc_i|/v for k = N-l, N-2,..., 1 is given by 

Pk,k-1\N = E[(X k — X k \ N )(Xk-l — X k _i\ N ) T ] 

Pn,n-i\n = {I ~ KnHn)Fn-iPn-i\n-i 

Pk,k-1\N = Pk\kE-k-l\N + E k \ N (Pk+l,k\N ~ FkPk\k)K k _l\ N 

where K k \w is the smoothed gain at discrete time ‘k’ obtained from the RTS smoothing algorithm- 
5.1. Using Eq-13 and Eq-14 we get the conditional expectation as 

E [w k w k \Z N ] = X k \ N X k \ N + P fc |jv + X k \ N f T (X k _x\ N ) — f(X k _i\N)X k \ N 

+ f{X k -i\ N )f T (X k _ 1 \ N ) — Pk,k-l\NF k _ 1 \N — Plk-l\ N Pk-l\N + 

Rearranging the above terms and using Eq-11, we get 

l N 

Q = Jj ^2i wl k\NWl k ^ N + P k \ N + F k _11AT P k — 11 AT F h — 11 N ~ P k,k-l\N F k-l\N ~ P k,k-l\N F k-l\N} ( 15 ) 
k= 1 

where wl k \ N = X k \ N - f{X k _x\ N ). 


5.4 The Proposed DSDT Method for Estimating Q 

We now estimate Q in a different way using the difference between the stochastic and dynamical 
trajectory (DSDT) method by following the extended EM method. The stochastic trajectory 
with the process noise can be approximated using the first order Taylor series expansion around a 
nominal point ( X n ) as 

= /(*„,,t) + /'(A', lf . q)(.V, , - A',,,..,) + w k (16) 

where f represents partial differentiation operation on f and thus f = • Consider the 

dynamical trajectory (Xd) without the process noise defined as 

Xd k = f{Xd k - 1) 

Xd k = /(X nfe _J + /'(X^XXdfc-r - x nfc _j (17) 

where and Xdo = X 0 . It is assumed that the nominal point (X n ) of both the above trajectories 
are close to the estimated dynamical trajectory (X nk ss Xd k |jy) where Xd k \ N = f{Xd k _x\N ) and 
Xdo\N = Xqijv- Subtracting Eq-17 from Eq-16 we get 

X k — Xd k =f\Xd k _i\N)(X k _i — Xd k _i\ N — Xd k _i + Xd k _\| jv) + w k 

w k =X k — Xd k — Fd k -i\N(Xk-i — Xd k - 1 ) (18) 

where the dynamical state Jacobian, Fd k -\jjv = §x\x=xd. , • 

w k wl = X k Xl - X k Xdl - X k Xl_xFdl_x + X k Xdl_xFdl_ x - Xd k Xl + Xd k Xd T k 
+ X d k X k _ 1 F'd k _x — Xd k Xd k _xFd k -i — Fd k _i\pjX k -iX k + Fd k _^ N X k -iXd^ 

+ Fd k -i\NX k -iX k _ 1 Fd k _ 1 — Fd k -i\NX k -iXd k Fd k _x + Fd k -i\iyXdk-iX k 
— Fd k _i\ N Xd k -iXd k — Fd k _i\ [N Xd k -\X k _ 1 F k _ l ^ N + Fd k _i\ N Xd k -iXd k Fd k _ x 

Additionally apart from Eq-14, we have the following results 

E[X k Xd T k \Z N ] = E[X k \Z N }E[Xdl\Z N \ = X k]N Xd k]N ) 

E[Xd k XdL k \Zisi\ = Xd k \fjXdi^ N -\- Pd k \]si l (19) 

E[XdkX^k_x\ZN\ = Xd k \ N Xd k _^ N + Pd k)k -i\ N J 

where Xd k \^ = f(Xd k _ ujv) is the predicted dynamical state trajectory without the measurement 
and process noise using the estimated parameter, Qn\n- Using Eq-14 and Eq-19 we get 

E \w k w k \Z N \ =w2 k \ N w2 k \ N + Pfc|Ar + Fd k _ 1 \ N P k _n N Fd k _i\ N — Pk, k -i\NFd k _^ N 
— Fd k _i\ N P kk _^ N + Pd k jjv + Fd k _i\ N Pd k _n N Fd k _i\ N 
— Pdk,k-i\NFd k _^ N — Fd k -%\ffPd^ k _x\ N (20) 
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where vu2 k \ N = X k \ N — Xd k m — Fd k _ k \N{X k -i\N — Xd k _i ijy). Consider the following term, 


Xd k - Xd k |jv = f(Xd k - 1 ) — f(Xd k -i\N) 

~ f(Xd k - m ) + Fd k _i\]y(Xd k -i — Xd k _i j jv) — f{Xd k _ i|jv) 

~ Fd k _ 1 \ N (Xd k -i - Xd k _i |jv) (21) 


Using Eq-21, we get the covariance of the dynamical trajectory as 


Pd k \ N =E[(Xd k — Xd k \^){Xd k — Xd k \N) T ] 
=Fd k _ m Pd k _ m Fd k _ m 

where Pdo|Af = Pq\n since Xdo\N = Xqijv. The lag one covariance of dynamical trajectory is 


Pd k ,k-i\N —F[{Xd k — Xd k \ N ){Xd k -\ — Xd k _ k \ N ) T ] — Fd k _ 1 \ N Pd k _i\ N 
Substituting the value of Pd k k _^\ N and Pd k |jy in Eq-20 and using Eq-11 we get 

l N 

Q =— ^2^ W ^ k \ NW2 k\N + P k\N + P < tfc~l|iV-Pfc-l|;vt 7 ’ Cl|V — Pk,k- 1 \ N F d k _ 1 |jy — Pl k _ 1]N F jv} 

k= 1 

If Q = 0 then X = Xd and assuming that P 0 |at ~ 0, R can be estimated as 

1 N 

IT ~ — h(Xd k \ N ))(Z k — h(Xd k \ N )) T } 

v fc=l 


( 22 ) 


(23) 


5.5 Choice of R 

The choice of R for the next filter pass can utilize one appropriate among the many that are 
possible. Bavdekar et al. (2011) used the smoothed residue Z k — h(X k \ N ) for R estimation using 
extended EM method given by, 

1 N 

IT = ^ ^2 — h(X k \ N )){Z k — h(X k \ n )) t + -fffc|;v-Pfc|jv-£ffc|jv} (24) 

v fe=l 

The choice of Mohamed and Schwarz (MS) for R estimation based on the filtered residue is 

1 N 

R = n ^ [ {Zk ~ M**!*))^* - KX k \k)) T + H k \ k P k \ k Ht lk } (25) 

v fe=l 

The choice of Myers and Tapley (MT) for R estimation based on the innovation is 
1 N 

R = n ^2 { (Zfc - M^*|*-i))(^fc - KX k \ k -i)) T - HkPk^Hl) (26) 

fc=l 

Thus the above three equations use respectively smoothed, after update, and before updated 
states, the measurement and their corresponding covariances. All the measurement noise statistics 
innovations, filtered residue, smoothed residue are assumed to be zero mean. The smoothed residue 
is the best statistic for R estimation. 

5.6 Choice of Q 

The choice of Q for the next filter pass can utilize one appropriate among the many that are 
possible. Bavdekar et al. (2011) used the smoothed statistic X k \ N — f(X k _\\ N ) for the Q estimation 
using extended EM method given by, 

i N 

1 r—> | rjp rjp rj-t r £i 1 

Q = ~j^ 2_y + P k\N + F k — 11 _ZV P k — 11 iV F k _ j | - P k,k-1\N F k -1\N ~ P k,k -11 N F k -11 JV j (27) 

fc=l 

where wl k \ N = X k \ N - f{X k _!\ N ). 
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The new DSDT statistic for Q (Section-5.4) is 
1 N 

Q ^2 { w2 k\N w2 k\N + P k\N + Fd k-l\N P k-l\N Fd k-l\N ~ P k,k-l\N Fd k-l\N ~ P k,k-l\N Fd k-l\N^ ( 28 ) 
k =1 

where w2 k \N = ^k\N ~ Xd k \N ~ Fdk-i\N{Xk-i\N — Xd k _i\pf). Mohamed and Schwarz (MS) used 
innovations and gain for estimating Q given by, 

Q = I<N | ^ - h (Xk\k-i))(Z k - h(X k | fc _ 1 )) T | K t n (29) 

The choice of Myers and Tapley (MT) for Q is w3 k \ k = X k \ k — X k ^ k _ 1 and is given by, 

1 N 

Q = ^ ^ {Fk-iPk\k-\F k ~i - p k\k) } (30) 

V k —1 

All the process noise samples, wl k \N,w2 k \N and w3 k \ k are assumed to be zero mean. We note 
that the smoothed statistics wl k \N and w2 k \jy provide very close results and are the best for the 
Q estimation. 

Much of the earlier work used simulated data to tune the filter off line for obtaining the statistics to 
be used later for on line and real time applications. While this is correct one important feature has 
been forgotten in such approaches. The convergence of any technique even in simulation studies is 
no guarantee for a proper solution to the problem. Even the simple case of a linear fit to a set of data 
many variants tend to different results (TR/EE2015/401 [?]). Hence we still lean on simulation 
studies with exact solutions being available to the analyst. Hence presently the filter methods have 
been applied firstly to a very simple spring, mass, and damper system for which when only the 
measurement noise exists and the process noise does not exist (Q = 0). For such a situation exact 
reference results are derivable by using the Newton Raphson (NR) technique (Ananthasayanam 
et al. 2001) and these can be compared with filter generated results. Subsequently when the 
process noise in included in the system it is necessary to look for the consistency based on a 
comparison between the injected measurement and process noise sequence and their statistics. 
Further many other cost functions based on balancing the states and measurement equations that 
are introduced help to move from deceptive to decisive solutions. The estimation of R and Q 
is possible provided one utilizes appropriate choice of the estimation ‘statistics’ based on many 
quantities that arise in the filter operation like the pre and post filter states as well as the ones 
derivable from the measurements such as the innovation, filtered residue, smoothed residue and 
their covariances. 

The above helped to guide us in the choice of appropriate initial filter statistics namely Xo, Po,0, 
Q and R. In particular after the first filter pass (using some initial guess statistics) through the 
data which is a must it has guided the way the filter statistics have to be updated from the second 
iteration onwards. It helps to answer the questions like what should be the P 0 for the states and 
the parameters be derived from the value at the end of the pass to be used in the next iteration, 
should they be full, diagonal, or even zero, whether the Q has to be injected into the state and/or 
the augmented parameter states. Since the reference NR estimates of the parameters, the Cramer 
Rao bound (CRB), and the measurement noise are available it has been possible to settle such 
questions and in particular the CRB played a crucial role to guide the choices for the above. 
Generally the Kalman filter provides fairly acceptable estimates for the parameters but unless the 
tuning is good it is very difficult to match the CRBs from the EKF with the NR values. 


5.7 Adaptive Tuning and the present Reference Recursive Recipe 


The different methods and options for filter tuning forms a part of sensitivity study which are, 

1. Po can be estimated by Scale up, IIM or by Smoothing (Po|jv)- 

2. Options for P 0 which can be split as cov([State-S;Parameter-P]) are 

(a) Reference Matrix-[0,0;0,/], 

(b) Diagonal matrix - [/,0;0,/], 

(c) Full matrix - [/,/;/,/]. 
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The checkmark (/) represent a nonzero value at the indicated position. The ‘cov (.)’ repre¬ 
sents covariance matrix. 

3. The process noise Q can be estimated by using Eq-27, Eq-28, Eq-29 or Eq-30. 

4. Options for Q = cov([S;P]) also include the same options used for P 0 except that the refrence 
matrix is [/,0;0,0]. 

5. The measurement noise R can be estimated by using Eq-24, Eq-25 or Eq-26. 

The following steps explain the recursive or iterative algorithm for tuning the EKF. 

1. Given the system model and the measurements the first iteration of EKF is carried out with 
guess values of X 0 , Po, ©, R and Q. 

2. Run the extended RTS smoother using the filtered data to get the smoothed state estimate 

and the corresponding smoothed covariance Pk\N- 

3. The Po can be estimated by Scale up (Eq - 9), IIM (Eq - 8) or the smoothed (Po|at) which 
will have to be scaled up and modified for obtaining proper results. 

4. The measurement and process noise covariance can be estimated by any of the options as 
discussed in Section-5.5 and 5.6. 

5. EKF is run using the updated estimates of X 0 , Po, 0 , Q and R at the beginning of further 
iterations until statistical equilibrium is reached. 

6. Different cost functions (J1 to J8) are checked for convergence. 

7. Many simulation runs (say 50) are carried out by varying the injected measurement (v) and 
process noise (w) sequences. 

For the Q = 0 case the value of Q is set at 10~ 10 or lower for all iterations to help the filter that 
would otherwise generate a pseudo Q and then slowly grind it to zero in hundreds of iterations. 
For the Q > 0 case if any of the states is known to have Q = 0 then it can be set at 10“ 10 or 
lower. For Q = 0 case one can even estimate R by ignoring the second order terms (assuming P—» 
0). It is of interest to note that for Q > 0 case unless the second order terms of the filter output 
covariance terms are also included in (i) the estimate for R and Q using the EM option and (ii) 
the estimate for R using the EM together with Q using the DSDT option the estimates for R and 
Q do not converge to the proper value. The different adaptive approaches analysed in the paper 
are provided in Table-1. Based on a comparative study the proposed RRR is as below. 


The proposed Reference Recursive Recipe (RRR) 


Q = 0 

Q > 0 

Xo : Given or X 0 \ N 

0 : 0at|at 

Po : Scaled up-[0,0;0,/] 
Q : 10" lo -[/,0;0,0] 

R : EM-diag 

Xo : Given or X Q \ N 

0 : Qn\n 

Po : Scaled up-[0,0;0,/] 

Q : EM/DSDT-[/,0;0,0] 
R : EM-diag 


We call this as a reference and not as a standard since improvements could be made later when 
such a recursive filtering approaches match the solutions provided by optimization techniques like 
NR or other involved ones including Q. The Xo in all cases is either given or obtained by the 
smoother. The smoothed initial estimate which can be split as X 0 |at = (xo|jv, 0 o|at) T including 
both state and parameter. The estimated parameter is taken as Qn\n obtained from Xpf |jy = 
[zjviiv, 0 jv|jv] with covariance Pq obtained at the end of the final filter pass over the data, Pn\n 
, PxS , Pqx , Pq] • 

Generally mathematical treatments implicitly assume unlimited data. However, in practice we 
have to deal with finite lengths of data and so need stable convergence properties reasonably 
independent of starting estimates. In any field of science or technology one needs at times stability 
and at other times sensitivity. In the present RRR there is stability in the procedure commencing 
from different reasonable initial values converging to the same result for a given measurement data. 
Should the measurement data change then the converged parameters also change (!) thus showing 
sensitivity. 
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6 Simulation Study of a Spring, Mass and Damper System 

Consider the SMD system with weak nonlinear spring constant in the continuous time (t) state 
space form given by 


±i(t) = x 2 (t) 

± 2 (t) = -OiXi{t) - Q 2 x 2 (t) - ©3 xl(t) 


where X \, x 2 are the displacement and velocity state with initial conditions 1 and 0 respectively. 
The x represents differentiation with respect to time (t). The unknown parameter vector is 0 = 
[©i,02 ,©3 ] t with the true values Qtrue = (4, 0.4,0.6) T . The ©3 is a weak parameter since its 
value do not affect the system dynamics much. The complete state vector, X=[x 1 , x 2l ©i, 0 2 , © 3 P 
of size (n +p) x 1. The measurement equation is given by 

Zk = HXk + Vk 


where H= 


1 

0 


0 

1 


0 0 
0 0 


is the measurement matrix of size m x (n + p) where m = n = 2 and 


p = 3. The values of the noise variances are R = diag(0.001,0.004) and Q = diag(0.001,0.002) 
where ‘diag’ is the diagonal operator as used in MATLAB®. All the figures are presented for only 
one simulation run to prevent cluttering. In the SMD system study, the guess value of Po chosen 
is 10 _1 for all states which is assumed to be a diagonal matrix in the first iteration. The guess 
value of Q chosen is 10 _1 for all states and zero for the augmented parameters. The guess value 
of R chosen is 2 _1 for all measurement channels. The initial parameters are chosen to be within 
±20% error. A total of N = 100 measurement data are simulated with the time varying from 0 to 
10 seconds in very small steps of <5t = 0.1 s. For zero process noise case, the maximum number of 
iteration is set to 20 over n s =50 simulations and for non zero process noise case it is set to 100 over 
50 simulations for obtaining generally four digit accuracy (though not necessary) in the results as 
presented in the Table-2a and 3. In the present reference procedure it was noticed that generally 
even if the initial state covariance, initial process and measurement noise variances are varied over 
a wide range of powers from -3 to ±3 together with the initial parameter values being set to zero 
one can reach the same estimation results for a given data. Thus there is stability with respect to 
far off initial conditions but sensitivity in the estimates with different data. Such studies show that 
the present reference recursive recipe leads to a non diverging, and consistent filter performance 
over many simulations and provides better results when compared to earlier approaches. The 
convergence of the following quantities through the iterations are analyzed. 


1. The parameter estimates 0 and their covariances P©. 

2. The noise covariance of Q and R. 

3. The state dynamics without measurement and process noises based on the estimated param¬ 
eter after the filter pass through the data Xd, the prior state X-, the posterior state X±, the 
smoothed state Xs and the measurement Z. 

4. The sample innovation, filtered residue and the smoothed residue along with their bounds 
which is the square root of the predicted covariances given respectively by (R +Hk.Pk\k-iH^), 
(R - H k \ k Pk\kH^ k ) and (R - H k \ N Pk\ N H^ N ) by the filter. 

5. The estimated measurement and process noise samples as well as their autocorrelations. 

6 . The cost functions (J1-J8) after the final convergence. 


6.1 Discussion of the Results 

The system under consideration is first solved using the recursive reference recipe (RRR). Later 
other sensitivity studies were carried out using other possible variations for Po, Q and R. Such 
studies lead to the conclusion that the reference recipe is just about the best possible and others 
may not always help the filter operation without divergence and even if they did may not lead 
to results as good as the reference recipe. The tabulated results are averaged over 50 simulations 
which includes the following 
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• 0 ratio EKF/NR is the ratio of EKF estimated parameter (0jv|jv) to that of the parameter 
estimated by NR method. 

• 0 ratio EKF/True is the ratio of EKF estimated parameter (@n\n) to that of 0 irue . 

• CRB ratio is the ratio of the square root of the parameter covariance (P©) estimated by EKF 
to that of the CRB estimated by the NR method. 

• Consistency ratio = cr®/SIGMA avg 


where er© = 


E (0 s - Q) 2 


SIGMA avg = ^ E y/Pg, n s is the total number of 


simulations, s is the simulation number, 0 is the sample mean of the estimated parameters. 

• Spread factor is a measure of percentage spread seen in the parameter estimates using both 
first and second order moments which is defined as 


Spread factor = 


/-W<e - by + CS 


100 

W\ 


• R ratio EKF/True is the ratio of the EKF estimated R to that of the true value of R. 

• R ratio EKF/NR is the ratio of the EKF estimated R to that estimated by NR method. 

• Q ratio is the ratio of the EKF estimated Q to that of the true Q. 

• The mean (/z) and standard deviation (a) of the cost functions (J1-J8) are over many 
simulations (n s =50). 


6.1a Without Process noise (Q = 0) 


The Table-2a and 2b show the results for the Q = 0 case. The appropriate reference results are 
shown inside the thick block. The later rows present results with different options. One can note 
that the parameter estimates, the ratio of R, and the cost functions are fairly comparable for 
different options. However it is the ratio of the filter estimated CRBs, (whose effect is amplified 
and shown) by the consistency ratio and the spread factor that differ from the corresponding NR 
estimates. This feature indicates that the reference procedure with its choice of Po, 0, Q and R 
updates is about the simplest and best among all possible options. The Table-2b shows the effect 
of using simply the smoothed Po without any scaling but using possible options. Even in this 
process noise free case due to the absence of scaling up the Po, the CRB ratio decreases due to 
the continuously decreasing smoothed Po|n with iterations and leads to incorrect consistency and 
spread factors in spite of deceptively good R ratio and cost function values. The above results 
show the importance of proper scaling and trimming the Po- When Q = 0 the estimate for R 
can use smoothed residue, filtered residue, and innovations with and without second order terms. 
In addition another estimate for R can be the difference between the measured and predicted dy¬ 
namics. It turns out that for process noise free case all the above seven options lead to statistically 
the same results. The Figs. 1, 2, and 3 show the variation of the estimated initial parameters and 
their variances through 20 iterations using the RRR (Q = 0). The x-axis for the above three plots 
is the cumulatively increasing time with iterations. For example in the simulated SMD system, 
the time instants vary from 0 to 10 seconds in the first iteration in small time steps of 0.1 seconds. 
The second iteration have time instants varying from 10 to 20 seconds, the third iteration with 20 
to 30 seconds and so on till 200 seconds for the 20 th iteration. The parameter and the uncertainty 
reach almost their final estimated values with three digit accuracy in about two and five iterations 
respectively. 

The correlation coefficient matrix is C l:j = , d ' 3 where d, 3 is the i th row j th column element of 

\dii X djj 

the parameter covariance matrix P© of size px p estimated by the EKF at the last time instant of 
the final iteration. For the spring, mass, and damper system a typical value for one data set with 
zero process noise is 


1.0000 

-0.0024 

-0.9387 

-0.0024 

1.0000 

0.1193 

-0.9387 

0.1193 

1.0000 
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The forces that balance the inertial force are dependent on velocity and displacement. Over a range 
of displacement that is excited the nonlinear spring constant (B 3 ) can be estimated only with high 
correlation with the linear spring constant (0i). If the range of displacement is increased then 
the above correlation reduces. However the damping coefficient (© 2 ) is estimated with modest 
correlation with other parameters since it is driven by velocity than the displacement. 


6.1b With Process noise (Q > 0) 

The Figs.4, 5, and 6 show respectively the variation of (i) the initially estimated 0 and its variance, 
(ii) R and Q and (iii) the different cost functions (J1-J8) through 100 iterations for the RRR (Q 
> 0) case. The cost functions J1-J3 correspond to the number of measurement (m=2). The J4 
in absence of process noise would correspond to the trace of the measurement noise R. The J5 is 
the negative log likelihood cost whose absolute value is shown in the plot. In presence of process 
noise, J6-J8 correspond to the number of states (n=2). 

It has been further extensively checked in the report, TR/EE2015/401 (2015) that the variation of 
the sample innovations, filtered residue and smoothed residue are consistent with the square root 
of their filter predicted variances (±cr bound). In the EKF approach since most of the quantities 
are Gaussian or approximated as quasi Gaussian and one would expect all the above quantities are 
close to being Gaussian and hence around one third of the total sample points to be outside the 
±cr bound. Similarly the injected and estimated measurement noise distributions during the final 
iteration were very close to each other as also their autocorrelations ideally expected to be close 
to the Kronecker delta function which provides confidence in the proposed filter algorithm. 

The Fig.7 show the predicted dynamics, filtered and smoothed estimate at the last iteration. 
There is an expected mismatch in the estimated dynamics (without the effect of R and Q) and 
the measurements made on the wandering dynamics indicating the presence of process noise. The 
next Figs. 8 to 13 show the variation of the noise R and Q estimates and the cost functions J1 
to J8 for MS, MT, and Bavedkar et al approaches. The important features to be noted are the 
MS though it converges leads to inaccurate estimates for the noise statistics and cost functions. 
The MT and Bavedkar et al approaches have in general no systematic variation and convergence 
of the estimates. The Table-3 shows the reference case in the second row the cost functions J1 to 
J3 and J6 to J8 are close to their expected values. The third row shows by using the smoothed 
Po without scaling and no stopping condition, the low spread factor is deceptive since there is no 
consistency in the parameter estimates with its covariance. 

Table 1: Different Adaptive Methods for Comparative Analysis 


Method 

Options for Pq 

Options for Q 

Options for R 

Remarks 

NR 

(See TR/EE2015/401) 

Not applicable 

Not applicable 

Using {Z - h(X e )) 

Applicable only 
for Q=0 case 

Reference 

(Present) 

Scaled up Pn\n (Eq.8) 
trimmed to [0,0;0,/] 

EM Algorithm (Eq.27) 
or DSDT (Eq.28) 
trimmed to [/, 0; 0, 0] 

Smoothed Residue 
(Zk - h(X m )) Eq.24 

Statistical 
equillibrium and 
CRBs are achieved 

IIM (Similar to 
Gemson’s method) 

IIM (Eq.9) 
trimmed to [0,0;0,/] 

MT Method (Eq.30) 
trimmed to [/, 0; 0, 0] 

Innovations 

(Zk - h(X k |fc_i)) Eq.26 

Initial R should be close 
to the true value. 

Bavdekar et al. 
method 

Smoothed P 0 \ N (Eq.7) 

EM Algorithm (Eq.27) 
with full matrix 

Smoothed Residue 
{z k - h(x k | W )) Eq.24 

Cost functions diverge 
after some iterations. 
CRBs are not achieved. 

MT Method* 

Scaled up Pn\n (Eq.8) 
trimmed to [0,0;0,/] 

MT Method (Eq.30) 
trimmed to [/, 0; 0, 0] 

Innovations 

(Z k - h(X k |fc_i)) Eq.26 

Cost function 
sometimes oscillate 

MS Method* 

Scaled up Pn\n (Eq.8) 
trimmed to [0,0;0,/] 

MS Method (Eq.29) 
trimmed to [/, 0; 0, 0] 

Filtered residue 
{Zk - h(X m j) Eq.25 

R can go to 
a very high value 


*Since in MS and MT methods the Po is not specified we have suggested the above 
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Table 2a: Sensitivity Study : (Q = 0). No. of iterations=20, No. of simulations=50. 


Study 

© ratio 
EKF/NR 

CRB ratio 
EKF/NR 

Consistency 

ratio-EKF 

Consistency 

ratio-NR 

Spread 

factor 

EKF 

Spread 

factor 

NR 

R ratio 
EKF/NR 

/X of 

J1-J5 

<J of 

J1-J5 

Po : Scaled up-[0,0;0,/] 

Q : EM-[/,0;0,0] 

R : EM-diag 

Reference adaptive EKF used for Q > 0 case gives extremely slow convergence of Q taking hundreds of iterations. 

Po : Scaled up-[0,0;0,/] 

Q : 10 _10 -[/,0;0,0] 

R : EM-diag 

1.0007 

1.0000 

0.9869 

1.0048 

0.9832 

0.9830 

1.0140 

1.3764 

1.0919 

1.0533 

1.3138 

1.0817 

0.8128 

1.5488 

14.5838 

0.8126 

1.5079 

14.5843 

1.0139 

1.0128 

1.9704 

1.9702 

1.9999 

0.0048 

-10.3911 

0.0512 

0.0512 

0.0013 

0.0008 

0.2243 

Po • Scaled up-diag 

Q : 10- lo -[/,0;0,0] 

R : EM-diag 

0.9998 

1.0024 

1.0255 

1.2439 

1.3877 

1.7525 

1.0518 

1.2966 

1.1761 

1.0533 

1.3138 

1.0817 

1.0223 

2.1729 

26.6449 

0.8126 

1.5079 

14.5843 

1.0117 

1.0061 

1.9530 

1.9534 

1.9998 

0.0047 

-10.3252 

0.0570 

0.0570 

0.0050 

0.0008 

0.2308 

Po : Scaled up-full 

Q : 10- lo -[/,0;0,0] 

R : EM-diag 

0.9991 

1.0023 

1.0471 

0.9487 

1.1580 

1.0710 

1.3638 

1.5696 

1.8936 

1.0533 

1.3138 

1.0817 

0.8922 

2.0313 

21.6826 

0.8126 

1.5079 

14.5843 

1.0045 

1.0006 

1.9668 

1.9670 

1.9999 

0.0047 

-10.4051 

0.0438 

0.0438 

0.0014 

0.0008 

0.2293 

Po : IIM-[0,0;0,/] 

Q : 10“ 10 -[/,0;0,0] 

R : EM-diag 

1.0014 

0.9990 

0.9667 

1.0066 

0.9942 

0.9792 

1.1296 

2.2338 

1.3654 

1.0533 

1.3138 

1.0817 

0.8997 

2.3360 

17.3808 

0.8126 

1.5079 

14.5843 

1.0228 

1.0204 

1.9690 

1.9710 

1.9998 

0.0050 

-10.2078 

0.0776 

0.0776 

0.0015 

0.0009 

0.2349 

Po : IIM-diag 

Q : lCT lo -[/,0;0,0] 

R : EM-diag 

0.9995 

1.0023 

1.0375 

1.2489 

1.386 

1.7504 

1.0476 

1.5883 

1.1881 

1.0533 

1.3138 

1.0817 

1.0205 

2.4209 

26.8883 

0.8126 

1.5079 

14.5843 

1.0160 

1.0102 

1.9530 

1.9542 

1.9998 

0.0049 

-10.1918 

0.0655 

0.0655 

0.0049 

0.0009 

0.2403 

P 0 : IIM-full 

Q : 10- 10 -[/,0;0,0] 

R : EM-diag 

0.9997 

1.0025 

1.0340 

1.2493 

1.7885 

1.7501 

1.0479 

1.6662 

1.1851 

1.0533 

1.3138 

1.0817 

1.0185 

2.4988 

26.7966 

0.8126 

1.5079 

14.5843 

1.0178 

1.0116 

1.9540 

1.9532 

1.9999 

0.0050 

-10.2001 

0.0656 

0.0656 

0.0050 

0.0009 

0.2420 




Table 2b: Sensitivity Study : (Q = 0). No. of iterations=20, No. of simulations=50. 


Study 

© ratio 
EKF/NR 

CRB ratio 
EKF/NR 

Consistency 

ratio-EKF 

Consistency 

ratio-NR 

Spread 

factor 

EKF 

Spread 

factor 

NR 

R ratio 
EKF/NR 

H of 

J1-J5 

<J of 

J1-J5 

Po : Smoothed-[0,0;0,/] 

Q : 10~ 10 -[/,0;0,0] 

R : EM-diag 

1.0006 

1.0002 

0.9911 

0.2361 

0.2306 

0.2315 

4.2997 

5.7348 

4.5352 

1.0533 

1.3138 

1.0817 

0.5240 

1.0375 

9.5986 

0.8126 

1.5079 

14.5843 

0.9990 

0.9997 

1.9982 

1.9982 

1.9999 

0.0048 

-10.5276 

0.0009 

0.0009 

0.0001 

0.0008 

0.2240 

Po : Smoothed-diag 

Q : 10 -1 °-[/,0;0,0] 

R : EM-diag 

1.0021 

0.9999 

0.9560 

0.0991 

0.2380 

0.0993 

10.8182 

7.4621 

16.1755 

1.0533 

1.3138 

1.0817 

0.5565 

1.5098 

14.2577 

0.8126 

1.5079 

14.5843 

0.9904 

0.9872 

1.9960 

1.9960 

1.9997 

0.0047 

-10.5493 

0.0016 

0.0016 

0.0002 

0.0008 

0.2294 

Po : Smoothed-full 

Q : 10- 10 -[/,0;0,0] 

R : EM-diag 

0.9997 

1.0011 

1.0260 

0.3089 

0.3272 

0.4565 

4.1863 

5.4617 

4.3984 

1.0533 

1.3138 

1.0817 

0.6813 

1.5528 

18.0659 

0.8126 

1.5079 

14.5843 

0.9891 

0.9859 

1.9970 

1.9971 
1.9998 
0.0047 

-10.5523 

0.0011 

0.0011 

0.0003 

0.0008 

0.2290 


Table 3: Sensitivity Study : (Q > 0). No. of iterations=100, No. of simulations=50. 


Study 

© ratio 
EKF/True 

Consistency 

ratio-EKF 

Spread 

factor 

EKF 

R ratio 
EKF/True 

Q ratio 
EKF/True 

fJ, of 

J1-J8 

<7 of 

J1-J8 

Po : Smoothed-full 

O • PM full 

Extended EM algorithm (Bavdekar et al. (2011) : Cost functions diverges after few iterations. 

R : EM-full 

There is a need for a precise stopping condition without which unity ratios cannot be achieved. 







1.9650 

0.0224 







1.9700 

0.0217 







1.9982 

0.0091 

Po : Scaled up-[0,0;0,/] 

0.9933 

1.0327 

7.9434 

0.9450 

1.0648 

0.0709 

0.0396 

Q : EM-[/,0;0,0] 

1.0201 

1.0772 

23.5655 

n qi 

i nooo 


n oosi 

R : EM-diag 

1.0764 

0.9128 

98.4334 

u.y Too 

r.uuzz 

-O. 1 OOO 

1.9439 

U.ZZol 

0.0422 







1.9533 

0.0618 







1.9585 

0.0373 







1.9988 

0.0241 







1.9996 

0.0241 

Po : Smoothcd-diag 

1.0210 

13.3560 

3.9686 

0.9502 

1.0286 

1.9998 

0.0656 

0.0076 

0.0325 

Q : EM-diag 

1.0018 

10.0828 

13.6349 

0.9703 

0.8676 

-8.9374 

0.2290 

R : EM-diag 

0.6806 

13.6464 

51.6473 



2.0031 

0.0147 







2.0027 

0.0239 







1.9976 

0.0282 


































Figure 4: Variation of initial parameters 0 O (continuous) and its P 0 (dashed) with iterations 



Figure 5: Variation of Q (dashed) and R (continuous) with iterations 



Figure 6: Variation of different costs (J1-J8) with iterations 













a 



b 



Figure 7: Comparison of the predicted dynamics h(Xd), posterior h(X+), smoothed h(Xs) 
and the measurement Z corresponding to the (a) displacement (b) velocity 



Figure 8: Variation of Q (dashed) and R (continuous) with iterations using MS method 



Figure 9: Variation of different costs(Jl-J8) with iterations using MS method 
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7 Conclusions 


A comparative study between the existing adaptive techniques suggested by Myers and Tapley, 
Mohamed and Schwarz and Bavdekar et al. is carried out and a reference recursive recipe (RRR) 
for tuning the Kalman filter is proposed. A new statistic for the estimation of Q based upon 
the difference between the stochastic and dynamic trajectories (DSDT) was introduced based on 
extended EM method. The different cost functions (J1-J8) helps the user to move away from 
deceptive to decisive convergence. The proposed RRR achieves the Cramer Rao Bound (CRB) of 
the unknown parameters and provide statistically equilibrium solution for the adaptive filtering 
problem in an easy and simple way without any need for direct optimization in a few filter iterations 
through the data. In the menacing problem of tuning the filter statistics, the importance of Po 
has been demonstrated and the notorious Q has been successfully handled. This RRR is believed 
to be sufficiently simple and general to be used in any Kalman filtering problem. 
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A List of Symbols* 


x 


0 

X = [x, 0] T 

Zb 

X 0 , P o 

R, Q 
J0-J8 

Xk\k~i 

Xk\k 

Xk\N 

Xd k \N 

Pk\k—1 

Pk\k 

Pk\N 

Pk,k-1\N 

K k 


Xk\N 

Fk-1 = 


df_ 

dX 


X=X k _!\ k _i 


State vector of size n x 1 

Parameter vector of size px 1 

Augmented state vector of size (n + p) x 1 

Measurement Vector of size to x 1 at discrete time index ‘k’ 

Initial state and its covariance 

Measurement and Process noise covariance matrix 

Cost functions 

Prior state estimate at time index k based on data upto k-1 
Posterior state estimate at time index k based on data upto k 
Smoothed state estimate at time index k based on data upto N 
Dynamical state estimate at time index k based on data upto N 
Prior state covariance matrix at time index k given data upto k-1 
Posterior state covariance matrix at time index k given data upto k 
Smoothed state covariance matrix at time index k given data upto N 
Lag one covariance of the smoothed state estimate 
Kalman Gain based on data upto time index k 
Smoothed Gain based on all data upto time index N 

State Jacobian matrix using posterior state estimate 


Pk-1\N — 

Fdk-i\N = 


0/ 

ax 


df_ 

ax 


' dh ' 

.ax. x—x k k .j 
tzt _ f dh 1 

n k\k - lax\x=x k[k 
Hk\N = [§x]x=x klN 


X—X k _n N 
J X=Xd k - i|jv 


State Jacobian matrix using smoothed state estimate 
State Jacobian matrix using dynamical state estimate 
Measurement Jacobian matrix using prior state estimate 
Measurement Jacobian matrix using posterior state estimate 
Measurement Jacobian matrix using smoothed state estimate 


*Most other symbols are explained as and when they occur. 


26 



References 

Alspach, D. (1974) A parallel filtering algorithm for linear systems with unknown time varying noise statistics. 
IEEE Trans. Auto. Cont., 19(5): 552-556. 

Ananthasayanam, M. R., Suresh, H. S. and Muralidharan, M. R. (2001) GUI based software for teaching 
parameter estimation technique using MMLE. Report 2001 FM 1,Project-JATP/MRA/053,IISc-Bangalore. 

Ananthasayanam, M. R., Anilkumar, A. K. and Subba Rao, P. V. (2006) New Approach for the Evolution 
and Expansion of Space Debris Scenario. Journal of Spacecraft and Rockets, Vol. J3, No. 6 : pp. 1271-1282. 

Ananthasayanam, M. R. and Bharadwaj, K. M. (2011) The Philosophy, Principles, and Practice of Kalman 
Filter since Ancient Times to the Present, ” J5th IAA History of Astronautics Symposium. 

Anilkumar, A. K. (2000) Application of Controlled Random Search Optimisation Technique in MMLE with 
Process Noise. MSc Thesis, Dept, of Aerospace Engineering IISc, Bangalore. 

Bar-Shalom, Y. and Li, X. (1993) Estimation and Tracking : Principles, Techniques and Software. Boston : 
Artech House Radar Library. 

Bar-Shalom, Y., Rong Li, X. and Kirubarajan, T. (2001) Estimation with Applications To Tracking and 
Navigation, Theory, Algorithm and Software. John Wiley and Sons. Inc. 

Bavdekar, V. A., Deshpande, A. P. and Patwardhan, S. C. (2011) Identification of process and measurement 
noise covariance for state and parameter estimation using extended Kalman filter. Journal of Process control, 
21 : 585-601. 

Belanger, P. R. (1974) Estimation of noise covariances for a linear time-varying stochastic process. Automatica 
Volume 10, Issue 3, May 1974, Pages 267-275. 

Bohlin, T. (1976) Four cases of identification of changing systems. System Identification: Advances and Case 
Studies, Academic Press, 1st edition. 

Bohn, C. (2000) Recursive Parameter Estimation for Non Linear Continuous time systems through Sensitivity- 
Model based Adaptive Filters. PhD Thesis, Dept, of Electrical Engineering and Information Science. 

Brown, R. and Hwang, P. (2012) Introduction to Random Signals and Applied Kalman Filtering, With 
MATLAB Exercises, 4f h Edition. John Wiley and Sons, Inc. 

Candy, J.V. (1986) Signal Processing The Model based approach. Mcgraw Hill Series in Electrical and 
Computer Engineering. 

Carew, B. and Belanger, P. R. (1973) Identification of optimum filter steady state gain for systems with 
unknown noise covariances. IEEE Trans. Auto. Cont., 18(6): 582-587. 

Costagli, M. and Kuruoglu, E. E. (2007) Image separation using particle filters. Digital Signal Processing 
17, pp. 935-946. 

Datta, G.S. and Mukerjee, R. (2005) Probability Matching Priors: Higher Order Asymptotics. Series: Lecture 
Notes in Statistics, Vol. 178. 

Evensen, G. (2009) Data Assimilation: The Ensemble Kalman Filter, Second Edition. Springer Verlag . 

Federer, W. T. and Murthy. B. R. (1998) Kalman Filter Bibliography : Agriculture, Biology, and Medicine. 
Technical Report BU-1436-M Department of Biometrics, Cornell University, Ithaca. 

Fruhwirth, R., Regier, M., Bock, R. K., Grote H. and Notz, D. (2000) Data Analysis Techniques for High- 
Energy Physics. Cambridge Monographs on Particle Physics, Nuclear physics and Cosmology. 

Gemson, R. M. 0. (1991) Estimation Of Aircraft Aerodynamic Derivatives Accounting For Measurement 
And Process Noise By EKF Through Adaptive Filter Tuning. PhD Thesis, Dept, of Aerospace Engineering 
IISc, Bangalore. 

Gemson, R. M. O. and Ananthasayanam, M. R. (1998) Importance of Initial State Covariance Matrix for the 
Parameter Estimation Using Adaptive Extended Kalman Filter. AIAA-98-4153, pp. 94-104- 

Grewal, M. S., Lawrence, R.W. and Andrews, A. P., (2007) Global Positioning Systems, Inertial Navigation, 
and Integration, Second Edition. John Wiley & Sons, Inc., Publication. 

Hilborn, C. and Lainiotis, D. (1969) Optimal estimation in the presence of unknown parameters. IEEE 
Trans. Systems, Science, and Cybernetics, 5(1):38~43. 

Julier, S.J., Uhlmann, J.K., Durrant-Whyte, H.F. (1995) A New Approach for Filtering Nonlinear Systems. 
Proceedings of the 1995 American Control Conference, vol. 3, pp. 1628- 1632 . 

Kailath, T. (1970) An Innovation Approach To Detection and Estimation Theory. Proceedings of the IEEE, 
Vol. 58, Issue: 5, pp. 680 -695. 

Kalman, R. E. (1960) A New Approach to linear Filtering and Prediction Problems. Transactions of the 
ASME-Journal of Basic Engineering, 82 (Series D) : pp. 35~45. 

Kalman, R. E., Bucy, R. S. (1961) New Results in Linear Filtering and Prediction Theory. J. Fluids Eng. 
83(1), 95-108. 

Kashyap, R. (1970) Maximum likelihood identification of stochastic linear systems. IEEE Trans. Auto. Cont., 
15(1), pp. 25-34. 

Klein, V. and Morelli, E.A. (2006) Aircraft System Identification. Theory and Practice. AIAA Edu. Series. 
Kleusberg, A. and Teunissen, P. J. G. (1996) GPS for Geodesy : First Edition. Springer Verlag . 

Lau, T. and Lin, K. (2011) Evolutionary tuning of sigma-point Kalman filters. IEEE International Conference 
on Robotics and Automation (ICRA), pp. 771-776 . 

Ljung, L. (1979) Asymptotic behaviour of the EKF as a parameter estimator for linear systems. IEEE trans. 
Automatic control, Vol. AC 24, PP- 36-50. 


27 



Ljungquist, D. and Balchen, J. G. (1994) Recursive prediction error methods for online estimation in nonlinear 
state space models. Modeling, Identification and Control, Vol. 15 No. 2 : pp. 109-121. 

Maine, R. E. and Iliff, K. W. (1981) Programmer’s manual for MMLE3, a general Fortran program for 
Maximum Likelihood parameter estimation”, NASA TP-1690. 

Manika, S., Bhaswati, G. and Ratna, G. (2014) Robustness and Sensitivity Metrics for Tuning the Extended 
Kalman Filter. IEEE Trans, on Instrumentation and Measurement, Vol. 63, No. 4 [, pp. 964-971. 

Maybeck, P. S. (1979) Stochastic Models, Estimation, and Control: Volume 1, ” New York: Academic Press. 

Mazor, E., Averbuch, A., Bar-Shalom, Y., Dayan, J. (1998) Interacting Multiple Model methods in Target 
Tracking: A Survey. IEEE Transactions on Aerospace and Electronic Systems, Vol. 34 - pp- 103 - 123. 

Mehra, R. K. (1970) On the identification of variances and adaptive Kalman filtering. IEEE Trans. Automat. 
Control, Vol. 15, N.2, pp. 175-184- 

Mehra, R. (1972) Approaches to adaptive filtering. IEEE Trans. Auto. Cont., 17, pp. 903-908. 

Mehrotra, K. Mahapatra, P. R. (1997) A Jerk Model for Tracking Highly Maneuvering Targets. IEEE 
Transactions on Aerospace and Electronic Systems, VOL. 33, NO. 4- 

Mohamed, A. H. and Schwarz, K. P. (1999) Adaptive Kalman Filtering for INS/GPS. Journal of Geodesy, 
Volume 73, Issue 4> PP 193-203. 

Myers, K. A. and Tapley, B. D. (1976) Adaptive Sequential Estimation with Unknown Noise Statistics. IEEE 
Transactions on Automatic Control, Vol. AC 21 : pp. 520-525. 

Neethling, C. and Young, P. (1974) Comments on Identification of optimum filter steady state gain for systems 
with unknown noise covariances. IEEE Trans. Auto. Cont., 19(5):623-625. 

Odelson, B. J., Lutz, A. and Rawlings, J. B. (2006) The autocovariance-least squares method for estimating 
covariances: application to model-based control of chemical reactors. IEEE Trans. Control Syst. Technol. 
Vol. 14, No. 3, pp. 532-540. 

Oshman, Y. and Shaviv, I. (2000) Optimal Tuning of a Kalman Filter Using Genetic Algorithm. AIAA Paper 
2000-4558. 

Oussalah M., De Schutter J. (2000) Adaptive Kalman Filter for Noise Identification. International Conference 
on Noise and Vibration Engineering, ISMA25 : pp. 1225-1232. 

Powell, T. D. (2002) Automated tuning of an Extended Kalman Filter Using the Downhill Simplex Algorithm. 
J. Guidance, Control, and Dynamics, Vol. 25, No. 5, pp. 901-908. 

Rauch, H. E., Tung, F. and Striebel, C. T. (1965) Maximum Likelihood Estimates of Linear Dynamic Systems. 
AIAA Journal, Vol. 3, No. 8 , pp. 1445-1450. 

Shumway, R. H. and Stoffer, D. S. (1982) An approach to time series smoothing and forecasting using the 
EM algorithm. J. Time Series Anal., 3, pp. 253-264- 

Shumway, R. H., Stoffer, D. S. (2000) Time Series Analysis and its Applications. Springer, Verlag, NY. 

Shyam, M. M. (2014) An Iterative Tuning Strategy for Achieving Cramer Rao Bound Using Extended Kalman 
Filter for a Parameter Estimation Problem. MTech Thesis, Dept, of Electrical Engineering, I IT Kanpur. 

Shyam, M. M., Naren Naik, Gemson, R. M. O, Ananthasayanam, M. R. (2015) Introduction to the Kalman 
Filter and Tuning its Statistics for Near Optimal Estimates and Cramer Rao Bound. TR/EE2015/401, Dept, 
of Electrical Engineering, IIT Kanpur, http: //arxiv. org/abs/1503. 04313. 

Valappil, J. and Georgakis, C. (2000) Systematic Estimation of State Noise Statistics for Extended Kalman 
Filters. AI Che Journal Vol. 45, No. 2, pp. 292308. 

Visser, H. and Molenaar, J. (1988) Kalman Filter Analysis in Dendroclimatology. Biometrics, pp 929-940. 
Wells, C. (1996) The Kalman Filter in Finance. Springer-Science+Business Media, BV. 

Zagrobelny, M. A. and Rawlings, J. B. (2014) Identification of Disturbance Covariances Using Maximum 
Likelihood Estimation. TR No. 2014-02, Department of Chemical and Biological Engineering, University 
of Wisconsin-Madison. 


28 



